Michael Clark
Statistician Lead
Consulting for Statistics, Computing and Analytics Research
Advanced Research Computing
Nature abhors the linear.
We may come across a situation where the target of interest \(y\) is a function of some covariate \(x\), whose effect is not straightforward in a linear sense. In a standard regression we can generally write the model as follows:
\[y = f(x) + e\]
If we are talking a linear relationship between y and x, f(x) might be a simple sum of the covariates.
\[y = b_0 + b_1*x + e\]
In that case we could do our standard regression model. Now consider the following functional form for x:
\[f(x) = sin(2(4x-2)) + 2e^{-(16^2)(x-.5)^2}\] \[e \sim N(0,.3^2)\]
set.seed(123)
x = runif(500)
mu = sin(2*(4*x-2)) + 2*exp(-(16^2)*((x-.5)^2))
y = rnorm(500, mu, .3)
d = data.frame(x,y) Let’s take a look at it visually.
In the past people would try and use polynomial regression, e.g. fitting a quadratic or cubic function within the standard regression framework. However, this is unrealistic at best and at worst isn’t useful for complex relationships. In the following even with a polynomial of degree 15 the fit is fairly poor in many areas.
We can divide the data into chunks at various points (knots), and fit a polynomial within that subset of data.
While this is better, again it is unsatisfactory. The separate fits are unconnected, leading to sometimes notably different predictions for values close together.
Polynomial splines seek to remedy the above problems by creating a smooth function while still taking this piecewise type of approach. To demonstrate, we’ll create a polynomial spline by hand. The following is based on Fahrmeier et al. (2013).
The approach is defined as follows, with \(\kappa\) knots on the interval \([a,b]\) as \(a=\kappa_1 < ... < \kappa_m =b\):
\[y_i = \gamma_1 + \gamma_2X_i + ... + \gamma_{l+1}(X_i)_+^l + \gamma_{l+2}(X_i - \kappa_2)_+^l ... + \gamma_{l+m-1}(X_i - \kappa_{m-1})_+^l + e_i\]
\[(X_i - \kappa_j)^l_+= \begin{cases} (X_i-\kappa_j)^l & X_i \geq k_j \\ 0 & \textrm{otherwise} \end{cases}\]
Note that there is nothing special about this. It is just a standard linear regression model when everything is said and done. The first part is even a standard polynomial of degree \(l\).
More generally and cleanly: \[y = f(X) + e = \sum_{j=1}^{d}B_j(X)\gamma_j + e\]
Above, \(B\) is a basis function that transforms \(X\) as needed, and the \(\gamma\) are the regression coefficients. But before getting too far let’s see it in action. Here our polynomial spline will be done with degree \(l\) equal to 1, which means that we are just fitting a linear regression.
knots = seq(0, 1, by=.1); knots = knots[-length(knots)] # don't need the last value
l = 1
bs = sapply(1:length(knots), function(k) ifelse(x >= knots[k], (x-knots[k])^l, 0))
head(bs)## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.2875775 0.1875775 0.08757752 0.0000000 0.000000000 0.0000000 0.0000000 0.00000000 0.0000000 0.00000000
## [2,] 0.7883051 0.6883051 0.58830514 0.4883051 0.388305135 0.2883051 0.1883051 0.08830514 0.0000000 0.00000000
## [3,] 0.4089769 0.3089769 0.20897692 0.1089769 0.008976922 0.0000000 0.0000000 0.00000000 0.0000000 0.00000000
## [4,] 0.8830174 0.7830174 0.68301740 0.5830174 0.483017404 0.3830174 0.2830174 0.18301740 0.0830174 0.00000000
## [5,] 0.9404673 0.8404673 0.74046728 0.6404673 0.540467284 0.4404673 0.3404673 0.24046728 0.1404673 0.04046728
## [6,] 0.0455565 0.0000000 0.00000000 0.0000000 0.000000000 0.0000000 0.0000000 0.00000000 0.0000000 0.00000000
If we plot this against our target variable \(y\), it doesn’t look like much.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
If we multiply each basis by it’s corresponding regression coefficient we can start to interpret the result.
lmMod = lm(y~.-1, bs)
bscoefs = coef(lmMod)
bsScaled = sweep(bs, 2, bscoefs,`*`)
head(bsScaled)## int X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## 1 0.7833339 -1.9086493 -0.2604127 0.3361754 0.0000000 0.0000000 0.00000 0.000000 0.0000000 0.0000000 0.0000000
## 2 0.7833339 -5.2319738 -0.9555696 2.2582705 3.5970435 10.0063385 -11.95372 2.856037 -0.6308213 0.0000000 0.0000000
## 3 0.7833339 -2.7143760 -0.4289507 0.8021797 0.8027659 0.2313287 0.00000 0.000000 0.0000000 0.0000000 0.0000000
## 4 0.7833339 -5.8605782 -1.0870580 2.6218334 4.2947305 12.4470042 -15.88068 4.292544 -1.3074129 -0.1442616 0.0000000
## 5 0.7833339 -6.2418725 -1.1668153 2.8423607 4.7179284 13.9274455 -18.26267 5.163891 -1.7178150 -0.2440938 -0.1373381
## 6 0.7833339 -0.3023581 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.000000 0.0000000 0.0000000 0.0000000
bscoefs## int X1 X2 X3 X4 X5 X6 X7 X8 X9
## 0.7833339 -6.6369906 -1.3882936 3.8386041 7.3663847 25.7692665 -41.4620449 15.1670683 -7.1436536 -1.7377269
## X10
## -3.3938064
In the plot above, the blue dot represents the global constant (\(\gamma_1\), i.e. our intercept, in our formula above). We have a decreasing function starting from that point onward (line). Between .1 and .2 (red line), the coefficient is negative, furthering the already decreasing slope (i.e. steeper downward). The fourth coefficient is positive, which means that between .2 and .3 our decreasing trend is lessening. Thus our coefficients \(j\) tell us the change in slope that occurs at knot \(\kappa_{j-1}\). If this gets you to thinking about interactions in more common model settings, you’re on the right track (e.g. adding a quadratic term is just letting x have an interaction with itself; same thing is going on here).
Finally if plot the sum of the basis functions, which is the same as taking our fitted values from a regression on the basis expansion of X, we get the following fit to the data. And we can see the trend our previous plot suggested.
One of the most common approaches uses a cubic spline fit. So we’ll change our polynomial degree to 3. Seems to be fitting the data well.
Let’s compare it to the gam function from the mgcv package. As we’ll see later we won’t usually specify the knots directly, and even as we have set things up similar very close to our own approach, the gam function is still doing some things our by-hand approach is not (penalized regression). We still get pretty close agreement.
We can see that we’re on the right track by using the contructor function within mgcv and a custom function for truncated power series like what we’re using above. See the example in the help file for smooth.construct for the underlying truncated power series function. I only show a few of the columns, but our by-hand construction and that used by gam are identical to several decimal places.
xs = scale(x, scale=F)
bs = sapply(1:length(knots), function(k) ifelse(x >= knots[k], (x-knots[k])^l, 0))
sm = smoothCon(s(x, bs='tr', k=14), data=d, knots=list(x=knots))[[1]]
head(sm$X[,1:6])## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 -0.20770617 0.043141852 -0.0089608288 2.378290e-02 0.006599976
## [2,] 1 0.29302145 0.085861568 0.0251592810 4.898725e-01 0.326094166
## [3,] 1 -0.08630677 0.007448858 -0.0006428868 6.840635e-02 0.029497019
## [4,] 1 0.38773372 0.150337434 0.0582908920 6.885061e-01 0.480080698
## [5,] 1 0.44518360 0.198188434 0.0882302397 8.318233e-01 0.593693698
## [6,] 1 -0.44972719 0.202254545 -0.0909593678 9.454771e-05 0.000000000
modelMatrix = cbind(1, xs, xs^2, xs^3, bs)
identical(round(sm$X,4), round(modelMatrix,4))## [1] TRUE
A natural question may arise as to how many knots to use. More knots mean more ‘wiggliness’, as demonstrated here.
As an example of other types of smooth.terms we might use, here are the basis functions for b-splines. They work notably differently, e.g. over intervals of \(l+2\) knots